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Determination of the intention of the test developer is fundamental to the choice 
of the analytical BK>del for a rating scale. For confirmatory analysis, they 
inform the choice of the general form of the model, representing the manner in 
which the respondent interacts with the scale, and also of the precise statement 
of that form, representing the intention of the analyst to construct, say, an 
"equal-^interval** scale, Exanqples of general forms, and precise statements are 
given. 
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Introduction: The nature of a Likert rating scale 

The construction of a rating scale is rarely haphazard, but Is rather the result 
of careful thought by the test developer, who has in mind a firm idea of the 
manner in v. *ich the categories represent different levels of the intended 
variable of knowledge, attitude, experience etc. For instance, the test 
developer may intend to construct an equal— interval scale in order to study, say, 
"completion of homework assignments^. Then Figure 1 portrays the test 
developer's intention. The categories of the rating scale are equal spaced 
reflecting the intention that the distances between the categories represent 
equal distances in the function of the category definitions* As Likert (1932) 
demonstrates, this straight-forward conceptualization has considerable merit. 

Closer examination of the scale, however, reveals that this approach may be 
somewhat naive. Some children who are rated ^None'' may regularly almost complete 
an assignment or two; other children, also rated ^none", may never have started 
even one assignmiunt. We are forced to the conclusion that each extreme category 
of the scale must represent an infinite continuum of performance above or below 
the scale, and further th:it it Is unlikely that our definition of the 
Intermediate categories is exactly equal-interval. 

Figure 2 portrays graphically what this might mean in terms of actual sones of 
performance on the necessarily infinite continuum of the underlying variable. 
Each extreme category (1 or 5 in this case) represents a conceptually infinite 
range of performance* Inters^diate categories (2, 3 or 4 in this case) represent 
ranges of performance for which the end-points are determined by the definitions 
of the adjacent categories. Empirically, no two intermediate cs-tegories will 
have ranges of exactly the same length on the infinite variable. 

Consideration of rating scale observations has often proceeded in one of two 
directions. Much analysis ignores the nature of the Infinite continuum and 
follows Likert in treating the ratings themeelves as an equal-interval scale • 
This frequently produces useful, if approximate results, but can be misleading 
if the observations are not towfirds the center of the scale, or the scale layout 
is far from uniform. Other analysis has appreciated the infinite nature of the 
continuum, but has proceeded in an exploratory manner, allowing the 
Idiosyncracies of the data to mandate the form of the scale. 

This paper is a step towards resolving this conflict between the developer's 



intention and the empirical realization of the rating scale, suggesting that a 
confirmatory approach to rating scale be considered, one in which the developer's 
intentions play the saajor role in the analysis. 

Following this line of reasoning, the problem of modelling rating scales thus has 
several aspects: 

1) the measurement problem: 

What is the general form of the measurement model which gives objective 
calibrations based on the developer's conception of the data? 

2) the problem of intention: 

In constructing the scale, the developer had some intention, usually 
expressed in terms such as ^the categories should be equally spaced". In what 
way can these Intentions be expressed as a special form of the measurement model, 
so that calibrations based on the developer's Intentions can be obtained, 
together with fit statistics and other diagnostic information as to the extent 
to which the empirical data reflects those int^sntions ? 

:>} the communication problem: 

How can interaction between the developer, the user and the data be 
promoted so that the greatest advantage is taken of both the information provided 
by the developer and that provided by each particular, but somewhat 
idiosyncratic, set of observations. 



L. The form of the m easurement model 

1. The dichotomous case 

The simplest form of the rating scale is the dichotomous item, for which the 
scale has merely two categories, 0 and 1. For convenience of conceptualisation, 
we can think of a person responding to a test item with 1 (a "success") 
representing a higher performance level than 0 (a "failure"). 

The production of such ordinal observations (counts) into linear from a linear 
combination of the underlying parameters requires a model of the form: 

F(P,,l) - B„ - D, (1) 

where 

Pnf is the probability of success of person n on Item i 

is a parameter representing the ability measure of person n, where 
n-l,N 

Df is a parameter representing the difficulty calibration of Item i, where 
1"*1 f I4 * 

F() is a ftxnction which monotonically transforms a value in the range 
(0,1) into a value in the range -infinity to ^infinity. 

The precise form of function F() for the dichotomous case, of which a general 
shape is pictured In Figure 3, has implications for the principles underlying the 
modelling of rating scales. 



The measurement model derived by Rasch can be expressed as 



log(P^i/(l^P^l)) - Bn - Df (2) 

in which the estimation is based on the ratio of the probability of success and 
the probability of failure, the logarithm of the odds. 

An alternative model which would appear to be just as useful Is: 

tanHa?ni - <l-Pnl))»/2) - B„ D, (3) 

In which the comparison is based on the difference between the probability of 
success and the probability of failure. 

However, though both models express each component (person or item) by one 
parameter, and are of linear form, there is a fundamental difference in their 
statistical properties. In Rasch' s model, the person parameters can be 
conditioned out of the estimation of the item parameters, and vice versa. Thus, 
ignoring the statistical bias introduced by the possibility ^>f extreme scores, 
it is not necessary to know whiuh particular persons answered an item correctly 
in order to estimate the item difficulty. The margins of the response matrix, the 
raw scores, are sufficient statistics for estimating the parameters. 

For the inverse tangent model, no sufficient statistics exist, so that, in order 
to estimate the parameters, it is necessary to know the details of the responses 
made* This threatens the basic concept of useful measurement, that the measure 
be essentially independent of the details of the device used to obtain It. 

For a polytomous rating scale, this concept becomes yet more complex, because 
there appear to be many alternative, yet reasonable, ways to express a rating 
scale » Three are considered here, all of which can be developed from the Rasch 
model for the dichotomous case, but with different hypotheses about the nature 
of the measurement situation. 

2. The Andrich Model for whollstlc scales 

Following Andrich (1978), the model built most closely on the work of Rasch can 
be expressed as 

log (Fnfj/PnfM) - - D, ^ Fj for j«l,J (4) 

where 

P^f j is the probability of an observation in category j 
Pj^jj-i is the probability of an observation in category j--l 
Bn is the abiXitry^ of person n 
D| is the difficulty of item I 

Fj is the step difficulty or threshold between categories j-1, where the 
categories are numbered, say, 0,J, where, for the purposes of this 
discussion, all items have the same category structure* 

Conceptually, this model requires that the relationship between any two adjacent 
categories is a dichotomotus Rasch model* For J«»l, the Andrich model becomes the 



Rasch dichotOBKius i^^l. The conceptnial tisuierpltming of the model Is that each 
category represents a qiialitatively different level of the variable, but that 
comprehension of all levels Is required In order to place the person in any one 
of them. Merging adjacent categories In the data together Into one category, or 
splitting a category Into two adjacent categories, necessarily changes the 
meaning of all title categories, and so the frame of reference of all the 
paras^ters. For estimates £t€m data %dilch are In accord with the Andrich model, 
collapsing categories lessens the discrimination of the measuring system and so 
contracts the estimates towards the mean, establishing a new frame of reference. 
Thus, even thou^ the category thresholds are parameterized independently, they 
must be considered together ^en Interpreting them* 

The sufficient statistics for the person measures and item calibrations are the 
margins of the score matrix, and the sufficient statistics for the category 
parameters are the gross counts of the nuinber of responses observed in each 
category. Details of particular ratings are not need for estimation, though they 
are required for analysis of the fit of the data to the model , as always « A 
further Important feature of this model is that, if the categories are reversed, 
i.e. counted from the other end of the scale, the measures are merely reversed 
In rign. Since the direction of a scale is arbitrary, this is an essential 
feature for measurement. 

The unavoidable redefinition of the frame of reference when the rating scalr 3 
amended motivates an exploration for alternative models which allow for ab 
addition or removal of categories without grossly disturbing the parameter 
estimates. 



3^ The Gl^s mod^l for incremental sca]>es 

Glas and Verhelst (1989) present a ^'steps" model for rating scales also based on 
the Rasch dichotomous model. The rating scale item is conceptualized as a multi- 
stage testing item, in which success on the previous category is required before 
a person is coxisidered to have attempted the next higher category. This model 
can be written as; 

log(Pnfj/(l-Pnlj)) - Bn ^ D} Fj for j-l,J when Xni>-j-l (5) 

where 

X^i is the observation resulting from person n interacting with item 

Each item is thus considered to be a sequence of notional category-items. The 
easiest category-itesi is administered first, followed by successively more 
difficult categor3r**items are administered until either the person fails a 
category-item or ^e sequence is exhausted* Table 1 depicts the ways in which 
the possible responses on a scale consisting of the 4 categories 0,1,2 and 3 are 
decomposed into category-items. 

For J«l, Glas^s model is also the Rasch dichotomous model. But since each 
category^-item is modelled to fit the Rasch dichotomous model, local independence 
is required to exist, conceptually, across the category'-i terns comprising each 



rating scale item. Consequently estimates of measures for the Glas model can be 
obtained using any software for estimation of the Rasch dichotomous model which 
allows missing data. If dependency between category-items exists because of the 
sequencing* then this will be reflected In the fit statistics • Though svrficlent 
statistics exist for t^is model, the form of the data is such that fully 
conditional estimation fails. 

The decomposition of che rating scale into category-items, expressed In Table 1, 
is strongly directional and not reversible without changing the meaning of the 
frame of reference and the calibrations in a comprehensive manner » The higher 
up the rating scale a person scores, the more category-items were encountered and 
so the more information is obtained. Reversing the category numbering would 
result in the person being analyzed on a test of different length. For scales 
in which the direction of numbering of tiie categories is arbitrary, Glas's model 
would give as^lguous results. 

For scales, however, which are not wholistic, but rather Incremental, Glas's 
model offers the possibility of rplitting or merging the top category without 
changing the meaning of the scale. It is not necessary to know anything of the 
higher categories in order to interpret the meaning of the lower ones. 



3) The McCulla^ model for incidental scales 

McCullagh (1980) presents the ^proportional odds'" model for rating scales in a 
number of versions. The version which is of interest here Is that which is 
analogous to the Rasch dichotomous model. This model can be written as: 

J J-1 

log( S Pnffc / S - ^ D{ - Fj for j«l,J (6) 

k-j k-0 

Thus every category boundary is considi^red to be equivalent to a dichotomous 
item, not just for the adjacent categories, as in the Andrich model, but for all 
the categories. The rating scale is conceptualized as being based on parallel 
logistic ogives, Figure 4, rather than the non-parallel ogival shapes resulting 
from Andrich's model (cf. Figure 7). 

For J-1, this is also the Rasch dichotomous model. For polytomous sc<^les, 
however, the probability of scoring in any intermediate category Is given by 

expCVDj-'Fi^ exp(VDi--Fj^i) 

PnU — ~ (7) 

(Uexp(B„-^D|-^Fj) ) (l+exp(B,.rI><^Fj^i) ) 

meaning that the probability of an observation in category j is the probability 
of succeeding on a dichotomous item associated with category j , less the 
probability of succeeding on one associated with category j--l* Thus, since 
Pnij>-0, then necessarily Fj+1>-Fj, so that the parameters for the ogives are 
monotonic with the category ordering* 

This model has the desirable property that reversing the category numbering 
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maintains the scaling system, but It Is not strictly a Rasch measurement model 
since it lacks sufficient statistics* 

An advantage of this model is that redefining the rating scale by merging or 
splitting categories does not change the frame of reference. Consequently 
initial calibrations and measxires can be estimated by bisecting the rating scale, 
forming a grouip of hi^er categories, and another group of lower categories, and 
then scoring all items as dichotomies based on which group the observed rating 
belongs to. More precise estimates can then be obtained by successively 
bisecting each of the groups, until the groups each comprise one category of the 
rating scale. The stability of the measures across bisections Is an indication 
of the fit of the model. 

This model is advantageous if the category bounciaries are entirely arbitrary, so 
that instituting a category boundary at one position is just as good as another. 
Further, the category boundaries are independent, so that the presence or absence 
of one does not affect calibrations based on an adjacent boundary, apart from 
estimation considerations. 



II. Modelllnf and communicating th e intentional form of the scale 

The rating scale models considered here, and others (e.g. Samejlma 1972), have 
the drawback that the calibrations rel«ited to the categories may not be 
limnedlately comprehensible to the developer. To illustrate the problem and to 
provide a basis for some graphical solutions. Table 2 and Table 3 present the 
rating scale calibrations for two structurally similar rating scales analyzed 
using the Andrich ^deX* The Figures in this section were excerpted from the 
output of the BIGSCALE (Wright et al. 1989) Rasch analysis computer program. 

Consider a scale In which each category clearly represents a qualitatively higher 
level of the variable. The test developer is thinking In terms of Figure 1. The 
calibrations for such a scale are presented in numerical form in Table 2. The 
step calibrations in coluram 2 are in ascending numerical sequence and can be 
thought of as the transition points in Figure 2. In fact, the Andrich 
calibrations correspond to the points in Figure 5 at which the probability curvt 
for each category intersects with the curve for the category below it, indicated 
by signs. 

Table 3 presents the calibrations for a less clearly defined scale* The Andrich 
step calibrations are no longer in sequence with the categories. The matching 
category probability curves are shown in Figure 6. Ttie correspondence between 
the calibrations and the intersection points is still the same, being the points 
of intersection between the curve for one category and that for the category 
below it, marked by "+* signs. Since, however, each categories is not in turn 
the most probable, the curves do not form a procession of "hills'*. This means 
that the Intersection points are disordered with respect to the category numbers. 
Th3;s disordering of intersection points is true for all rating scale models, but 
is reflected in different manners by the parameters. 

Rather than consider the probability of any individual category, the cumulative 
probability curves, or '*zone" curves (rtasters 1980), corresponding to the 
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probability of observing that category or below could be drawn. These are shown 
In Figure 7 for the clearly--de fined scale. Disordering of the Andrlch 
calibrations is reflected by the close proximity of some of the cumulative curves 
In Figure 8. For the McCulla^ model, these ctirves would always be parallel 
logistic ogives I the cc^lement of Figure 4, and the points of intersection 
between the ogives and the .5 probability line would be the McCullagh category 
calibrations, a very clear pictorial representation of the scale. The Andrlch 
and Glas models can also be depicted as plots of parallel logistic ogives, in 
which again the ,5 probability line Intersects each ogive at the category 
calibration point* but these plots are more tortuourly related to t^e expected 
responses, and, throu^ them» to the eisplrlcal scores. 

An alternative approach to the scale Is not in terms of the occurrence of any 
particular category « but In teras of ^at score the person is expected to make 
on the Iteis. For the calibrations In Tables 2 and 3, these are shown by the 
ogives In Figures 9 andl 10, For the models considered here, the score ogives 
must be monotonlc ascending. They are calculated by summing the products of the 
category number and Its probability for each point on the variable. Disordering 
of Andrlch calibrations is reflected by marked changes in slope of the ogive. 
The sectors of the ogive, which, when rounded to tlie nearest integer, correspond 
to each of the possible expected scores on the item, are indicated by bands. 
The bands Indicate the point at which an expected score ex^ritly corresponds 
to a category nus&ber, the value of the expected score. 

In many respects, Figure 1 Is conceptualized by the developer in terms of 
observed score Intervals, rather than category probabilities, so that the 
expected score bands In Figures 9 and 10 most closely correspond to the 
idealization In Figure 2» The numerical details of the Integral expected score 
Intervals are presented In the ri^t-hand columns of Tables 2 and 3. These score 
interval calibrations can then be compared with the person measures and item 
calibrations to determine what category score the person is expected to achieve 
on the Iten. The score ogives for all the models have much the same appearance. 



Incorporating the d eveloper's intentions 

The conventional Raach-based analysis of Rating Scales is based on the premise 
that nothing is known, a priori, about the structure of the Rating Scale apart 
from the fact that numerically hl^er rating scale categories represent ''more" 
of the latent variable. The general approach (e,g, HSCALE, 1P86) is to collapse 
a scale Into ascending ordinally counted categories and estimate the calibrations 
of the steps between the observed categories strictly on the basis of the 
observations In the data set at hand. For many applications this is sufficient 
to lead to useful calibration of the rating scale structure. 

In examining and describing the models to this point, the relationship between 
the empirical data and the developer's Intentions have been down-played. The 
empirical data, however, always depart to some extent from the developer's ideal. 
Consequently, parameter estimates obtained from the data only give an 
approximation to the rating scale structure* For Instance, no observations of 
a particular category may occur at all in the data set under examination, though 
that category has good theoretical grounds to exist, and has been observed in 
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other data sets. 



It is rarely the case in modelling rating scales that the developer Is interested 
in describing a particular set of empirical data as precisely as possible. 
Rather, the etapirlcal data %^ich Is being used to calibrate or validate the scale 
is known to be but one manifestation of the use to which the scale Is to be put, 
and so long as the data does not markedly challenge the developer's intentions, 
it is those intentions i^lch are superordinate , not the characteristics of the 
particular data set being analyzed. The next stage, therefore, is to constrain 
the step calibrations in accord with the developer's Intentions, \A\ich are always 
an idealization of the scale. The fit of the data to the resulting aodel will 
indicate the degree to which the ideal is challenged by the actual. 

The concept of aK>delling the developer's intentions by means of algebraic 
relationships betiwsfln the category calibrations is well known. Rasch 
(1960/1980), Wright and Masters (1982), Masters and Wright (1984) present certain 
scale structures which could be regarded as rating scales, such as Poisson counts 
and counts of successful Bernoulli trials. The concept of choosing the most 
useful model is also well established. If it is not clear whether all the trials 
involve situations of equal difficulty then a decision must be made as to whether 
to fit the data to a Bernoulli model or a more general rating scale model. 

Anchoring the scale calibrations 

The most extreme distortion to a racing scale in an empirical data set occurs 
when a category is not observed. The missing category can be forced Into the 
analysis and calibrated, merely by including in the analysis a dummy data record 
containing such an observation. This will lessen the distortion introduced into 
the frame of reference by the omission of the category, and so improve the 
overall quality of all the calibrations, but it 5s unlikely to lead to an 
accurate set of calibrations for the rating scale. 

A more useful approach to distortion of the rating scale, for whatever reason, 
may be to pre-set or anchor the rating scale calibrations. If the rating scale 
is well understood, it is likely that a useful set of calibrations for the scale 
has already been obtained. These can be forced into the analysis, and the degree 
to which the data reflect the mandated structure can be determined by means of 
fit statistics and residuals. Some analysis software, e.g. FACETS (Linacre, 
1988) and BIGSCALE, permit the pre-calibration, ("anchoring"), of category 
c/nlibrations for both observed and unobserved categories. 



More general structural concepts 

Scale designers may intent to construct their scales in terms of "equal 
interval", "skewed", or "clustered^ categories. Operational Izlng this Ideas 
mathematically, however, is a considerable challenge. 

If the design of the rating scale was intended to meet sonte goal (e.g. the scale 
is to be "equal interval"), the analyst may wish to assert this in the scale 
calibrations, both to estimate such an "equal interval", and to force any 
conflict between the design intention and the observed data to manifest Itself 
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in fit statistics. In this way. It can be determined whether the empirical data 
contradicts or supports the intended design of the scale, 

Andrlch (1978) Includes in his discussion the case of rating scales with 
"equidistant thresholds". However, his thresholds are conceptualised in terms 
of the intersections of the probability curves, shown in Figure 5, These may not 
correspond to what the scale designer considers to be ^equal interval*" in terms 
of Figure 9. Nevertheless, once the external considerations have been reduced 
to a Kiathematical oKpression Involving Rasch rating scale parameters, such as 
Andrlch* s equidistant threshold model, it is relatively straight- forward to 
construct estimation equations. Their form will be close to those given in 
Urlght and Hesters (1982). Suitable fit statistics can also be calculated to 
report how significantly the data diverge from the intendc^d design model for the 
scale* 



A sample design problem 

An example is now presented of a number of ways in which a notionally equal 
interval" scale could be parameterized for the Andrlch model. The intention here 
is to Indicate to the analyst the nature of the information needed from the 
designer in order to be able to put into explicit mathematical form the scale 
designer's conceptualization of the rating scale, A more complex design would 
yield an even greater number of possible mathematical realisations. 

i) The xal probability thresholds of adjacent categories are at equal 
intervals. (Andrlch* s equidistant threshold case). 

Using the Rasch rating scale parameterization: 

log(Pn|j /PniM ) - Bn D| ^ Fj (8) 

in which Fo ~ 0, and S(Fj) - 0, where j-0,J 

Then an equal interval scale would be one in which, (Fj Fj-x) - a constant 
across all j» except the extreme. Then 

Fj - C((j^l) - (J«l)/2) (9) 

Such a set of (Fj, j-1,5) would be -2, -1, 0, 1, 2, producing Figure 11. The 
value, C, could be either pre^-set or estimated from the data. If the thresholds 
according to the empirical data are very disordered, then the estimate of C could 
be negative, indicating that only the extreme categories are most probable to be 
observed. 



11) If the rating scale is intended to represent counts of successes on 
similar, exchangeable, tasks, then it can be represented by a Bernoulli trials 
model. The Bernoulli trial model for a 6 category scale yields a rating scale 
of probability structure shown in Figure 12, with parameter values, following 
Wright and Masters (1982 p. 51), of {Fj, j-1,5} - -1.61, --,69, 0, .69, 1.61 
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ill) The maximum probability points of notv-extreme categories are at equal 
Intervals, I.e. the *hlll tops* in Figure 13 are equally spaced. The condition 
Is: 

* j 

(exp(j*Xfc - S Ft,)/ (standardizing factor)) « 0 (10) 

fiXk h-1 

vhen ~ Xj^.^ « C, a constant across all k, except the extremes. 

Such a set of {Fj, J-1,5) is -2, -1.24. 0, 1.24, 2. 



iv) The points on the variable where the expected score is equal to the category 
value are equally spaced, i.e. the bands In Figure 14 are equally spaced. 
The condition is: 



S exp(j*Xjc - S F|,)/(standardizing factor) - k (U) 
J-1 h-1 

when Xfc - X|{_j - C, a constant across all k, except the extremes. 



Such a set of CFj, J-1,5) is -2, -1.24, 0, 1.24, 2. 

For the Andrich model, these points on the variable are also the points of 
maximum probability for the form modelled In (iii). 



v) The equal Intervals are intended to represent uniform spacing of the levels 
representing equal probabilities of being scored In or above a certain category. 
The condition is: 

^ i 

S exp(j*Xk - 2 Fh)/(standardizing factor) - 0.5 (12) 
j-0 h-0 

when )Ck ~ - C, a constant across all k, except the extremes. 

A set of parameters is {Fi, j-1,5) - -2, -1.22, 0, 1.22, 2, depicted In Figure 
15. 

For the Andrich model, these parameters are close to those for options (iii) and 
(iv). For this same constraint applied to the McCullagh model, the parameter 
values would be 

(Fj, j-1,5) - -2, -1, 0, 1, 2. 



vi) The half-point expected score thresholds are equally spaced, i.e. the "j" 
bands in Figure 16 are equally spaced. The condition is: 
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J i 

S eKp(j*Xk - S Fh)/(^^tandardlzing factor) - k + 0,5 (13) 

when X|( - X^,^ a constant across all k» except the extremes. 

Such a set of {rj, j-l,5J is -2, -1.77, 0, 1.77, 2. 

f) Conclusions and implications: 

Analysis of rating scales has tended to ignore the intentions of the designer of 
the scale. Thus it is has not been possible to ans%rer the question ""Does the 
empirical data support or refute dhe hypothesis that the rating scale is 
functioning in accord with the intentions of its designer 7^ xhe challenge to 
the analyst is to discern the designer's intentions and to convert them into the 
mathematical model, which can most usefu!'''y advance the understanding of the 
scale. 
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Rating 
(Score) 


Category-Items 
12 3 


Length of item 
item "test" 
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N 
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1 


1 


1 


0 
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2 


2 


1 


1 


0 


3 


3 


1 


1 


1 


3 



Table 1. Scoring of a 4 category item according to the Glas model. 
"N** indicates that the category item is considered not to be 
administered . 



CATEGORY 


STEP 


STEP 


EXPECTED 


SCORE CALIBRATIONS 


LABEL 


CALX BR. 


ERROR 


STEP-. 5 


AT STEP 


STEP+.5 


0 


NONE 






EXTREME 


-3 .69 


1 


-3.50 


. w(3 


-3.69 


-2.31 


-1.19 
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-1.00 


.04 




-.25 


.69 
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.50 


.03 


.69 


1.81 


3.19 


4 


3.00 


.20 


3.19 


EXTREME 





Table 2. Calibrations for an empirically clearly-defined rating 
scale fitted to the Andrich model. The step calibrations are in 
ascending se(|uence. 



CATEGORY 
LABEL 


STEP 
CALIBR. 


STEP 
ERROR 


EXPECTED 
STEP-. 5 


SCORE CALIBRATIONS 
AT STEP STEP+ . 5 


0 


NONE 






EXTREME 


-2.23 
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-1.00 
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-2.23 


-1.51 
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2 


-2.00 


.06 


-.84 


.15 
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3.00 


.02 


.98 


1.53 


2.13 


4 


.00 


.10 


2.13 


EXTREME 





Table 3. Calibrations for an empirically ill-defined rating scale 
fitted to the Andrich model. The step calibrations are not in 
ascending seqpaence. 
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Figure 1. A scale developer's idealized conception of a rating 
scale. 
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Figure 2. The rating scale expressed in terms of performance on 
the underlying variable. 
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Figure 3. A simple ogive representing a dichotomy, expressing the relationship 
between a ceasiire and an expected score. 
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Figure 5. Probability of responding in each category of the clearly-defined 
Andrich scale for a person whose measure is indicated below the x-axis. The 
points of equal probability of adjacent categories are shown by "+". 
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Figure 6. Probability of responding in eacb category of the ill-defined Andrich 
scale for a person whose measure is indicated below the x-axis. The points of 
equal probability of adjacent categories are shown by "-i-", and are disordered 
with respect to the categories. 
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Figure 7. Probability of responding in a given category or below, for the 
clearly-defined scale, accr rding to the Andrich model. 
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Figure 8* Probability of responding in a given category or below, for the ill" 
defined scale, according to the Andrich model. 
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Figure 9. Expected score ogive for the clearly-defined scale. 
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Figure 10. Expected score ogive for the ill-defined scale. 
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Figure 11. An "equal-interval" scale interpreted as equi-distant intersections 
of adjacent category probability curves. The intersections are indicated by 
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Figure 12. An "equal-interval* scale interpreted as counts of successes on 
Bernoulli trials. 
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Figure 13. An "equal- interval" scale interpreted as equally spaced maxinia of the 
intermediate category probability curves. The maxima are indicated by 
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Figure 14. An "equal-^interval" scale interpreted as equally spaced Integral 
expected score values, which are Indicated by the vertical lines. 
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Figure 15. An "equal- Interval" scale interpreted as equally spacing of the 
intersctions between the cumulative probability curves and the 0.5 probability 
line. 
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F%ure 16. An ''equal-interval'* scale interpreted as equally spacing of the 
thresholds between adjacent rounded expected score intervals, indicated by "j" 
lines . 



i 



22 



